Improving fuzzy clustering of biological data by metric learning with side information
نویسندگان
چکیده
Semi Supervised methods use a small amount of auxiliary information as a guide in the learning process in presence of unlabeled data. When using a clustering algorithm, the auxiliary information has the form of side information, that is a list of co-clustered points. Recent literature shows better performance of these methods with respect to totally unsupervised ones even with a small amount of side information. This fact suggests that the use of Semi Supervised methods may be useful especially in very difficult and noisy tasks where little a priori information is available, as is the case of data deriving from biological experiments. The two more frequently used paradigms to include side information into clustering are Constrained Clustering and Metric Learning. In this paper we use a Metric Learning approach as a way to improve the classical fuzzy c-means clustering through a two steps procedure: first a series of metrics (one for each cluster) that satisfy a randomly generated set of constraints are learnt based on the data; then a generalized version of the fuzzy c-means (with the metrics computed in the previous step) is executed. We show the benefits and the limitations of this method using real world datasets and a modified version of the Partition Entropy index. 2007 Elsevier Inc. All rights reserved.
منابع مشابه
Composite Kernel Optimization in Semi-Supervised Metric
Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...
متن کاملAn Effective Approach for Robust Metric Learning in the Presence of Label Noise
Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...
متن کاملیادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیکهای یادگیری معیار فاصله
Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...
متن کاملProposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملRelaxational metric adaptation and its application to semi-supervised clustering and content-based image retrieval
The performance of many supervised and unsupervised learning algorithms is very sensitive to the choice of an appropriate distance metric. Previous work in metric learning and adaptation has mostly been focused on classification tasks by making use of class label information. In standard clustering tasks, however, class label information is not available. In order to adapt the metric to improve...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. J. Approx. Reasoning
دوره 47 شماره
صفحات -
تاریخ انتشار 2008